A Programming Interface for NUMA Shared-Memory Clusters
نویسندگان
چکیده
We describe a programming interface for parallel computing on NUMA (NonUniform Memory Access) shared memory machines. Although the interest in this architecture is rapidly growing and more and more hardware manufacturers offer products of this type, there is still a lack in parallelization support. We developed SMI, the Shared Memory Interface, and implemented it as a library on an SCI-coupled cluster of workstations. It aims at providing sophisticated support to account for the NUMA performance characteristic and to allow a step-by-step parallelization. We show it’s application to the parallelization of a sparse matrix computation.
منابع مشابه
Experiments with Cholesky Factorization on Clusters of SMPs
Cholesky factorization of large dense matrices is an integral part of many applications in science and engineering. In this paper we report on experiments with different parallel versions of Cholesky factorization on modern high-performance computing architectures. For the parallelization of Cholesky factorization we utilized various standard linear algebra software packages and present perform...
متن کاملMPC: A Unified Parallel Runtime for Clusters of NUMA Machines
Over the last decade, Message Passing Interface (MPI) has become a very successful parallel programming environment for distributed memory architectures such as clusters. However, the architecture of cluster node is currently evolving from small symmetric shared memory multiprocessors towards massively multicore, Non-Uniform Memory Access (NUMA) hardware. Although regular MPI implementations ar...
متن کاملImplementing Transparent Shared Memory on Clusters Using Virtual Machines
Shared memory systems, such as SMP and ccNUMA topologies, simplify programming and administration. On the other hand, clusters of individual workstations are commonly used due to cost and scalability considerations. We have developed a virtual-machine-based solution, dubbed vNUMA, that seeks to provide a NUMA-like environment on a commodity cluster, with a single operating system instance and t...
متن کاملFlexible Operating System Support for Sci Clusters ?
The bottleneck for many parallel and distributed applications on networks of workstations is the high cost of communication on traditional network interfaces. Memory-mapped network interfaces provide latencies of a few microseconds and bandwidths close to the maximum of the local I/O bus. Data is transferred directly between memories without involving the operating system, thereby inducing very...
متن کاملOpenMP performance analysis for many-core platforms with non-uniform memory access
One of the first steps in embedded-system design flow is to choose the most efficient implementation of the embedded software application. However, this is difficult to do at the earliest design stages because particular details of the final manycore HW platform are usually unknown and many possible mappings of the software tasks/threads have to be evaluated. This paper presents a complete fram...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997